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Optimal phase space projection for noise reduction 
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In this communication we will re-examine the widely studied technique of phase space projection. 
By imposing a time domain constraint (TDC) on the residual noise, we deduce a more general 
version of the optimal projector, which includes those appearing in previous literature as subcases 
but does not assume the independence between the clean signal and the noise. As an application, 
we will apply this technique for noise reduction. Numerical results show that our algorithm has 
succeeded in augmenting the signal-to-noise ratio (SNR) for simulated data from the Rossler system 
and experimental speech record. 

PACS numbers: 



I. INTRODUCTION 

Due to its simplicity in implementation and efficiency 
in computation, noise reduction based on phase space 
projection has been widely studied in previous literature. 
For example, Broomhead and King ^ advocated that, 
in case of white noise, via singular value decomposition 
(SVD), one could extract qualitative dynamics from ex- 
perimental (noisy) time series by removing the empirical 
orthogonal functions (EOFs) [13 of the trajectory matrix 
which correspond to the noise components. To deal with 
the case of colored noise, Allen and Smith proposed 
a more general method, which would statistically pre- 
whiten colored noise by introducing a transformation to 
the covariance matrix of noise. In general, phase space 
projection based on these methods would not operate 
on the EOFs that span the signal-plus-noise subspace, 
therefore those operations could achieve a lowest possi- 
ble distortion for the clean signal, but at the price of 
a highest possible residual noise level [ij. To obtain an 
optimal tradeoff between signal distortion and residual 
noise so as to minimize the overall distortion, Ephraim 
and Trees proposed the time domain constraint (TDC) 
projector |j] , which improves the performance of the ex- 
isting methods by imposing a constraint on the residual 
noise, and which also includes the existing methods as 
its subcases. As a generalization, some authors also ex- 
tended the TDC projector to the cases with colored noise 

m 

Usually, these authors will make two assumptions con- 
cerning the experimental time series. The first assump- 
tion is that the time series is stationary and ergodic, and 
the second one is that the noise components are indepen- 
dent of the clean signal. In this communication we will 
re-examine the idea of the TDC projector and deduce a 
more universal version. We will also show that, with the 
first assumption, the second is not necessary in general. 

The remainder of this article will go as follows: In the 



second section we will introduce the idea of the TDC 
projector. Based on the assumption that the noisy time 
series is stationary and ergodic, we will obtain the opti- 
mal TDC projector for a trajectory matrix in the sense 
of minimizing signal distortion subject to a permissible 
noise level. In the third section we will apply the optimal 
TDC projector to simulated data from the Rossler sys- 
tem and experimental speech data. We will also compare 
the performance of the projectors under different TDCs. 
Finally, a conclusion is available to summarize the whole 
article. 



II. MATHEMATICAL DEDUCTION 

Given a noisy time series s = {si}^i, we suppose 
that the corresponding clean signal and the additive 
noise components are d = {di}fLi and n — {ni}f^-^ re- 
spectively, thus for each noisy data point Si, we have 
Si = di + Tii. In addition, we assume {si}f^i are (weakly) 
stationary and ergodic so that its expectation exists and 
its variance is finite, while its (auto)covariances only de- 
pend on the time difference between the subsets. 

Following the definition in we could construct a 
{M — m + 1) X m trajectory matrix S from {si}f£i by 
letting 



S = 
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with M — m -\- I > TO. Similarly, we could also obtain 
the corresponding trajectory matrices D and N for com- 
ponents {di}f£i and {ni}f£-^ respectively, and we have 
S = D-(-N. 

For the purpose of noise reduction, we introduce a pro- 
jection operator H on the trajectory matrix S of noisy 
signal, through which wc could obtain a matrix Z — SH. 
We define Rq = Z - D = D(H - I,„) + NH as the ma- 
trix of residual signal, where the term D(H — I„J means 
signal distortion and the term NH is residual noise. With 
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the intention of data augmentation, we would require 
to achieve as smah signal distortion as possible. Thus 
H = Im would be an intuitive choice. However, in sit- 
uations such as speech communication, one would also 
require a permissible residual noise level of the noisy sig- 
nal, and the objective becomes to minimize signal dis- 
tortion subject to achieving a permissible residual noise 
level. Thus if the initial data does not fulfil this require- 
ment, one has to reduce the initial noise level at the price 
of introducing possible signal distortion. Similar to the 
idea proposed in Q , here we impose a time domain con- 
straint (TDC) /i on the term of residual noise NH and 
treat R = D(H — I^) -|- /zNH as the part that requires a 
minimal distortion, where /i^ G [0, +00) is the Lagrange 
multiplier determined by the permissible noise level from 
the practical demand (see Eq. (33) of j^l a-nd the related 
discussions therein). Thus our objective will be to min- 

■ M 



imize the average energy = m J /-^^ of the data 

set r = {ri}fLi that (approximately) corresponds to the 
matrix R. If M ^ to, then 



1)to 



tr(R^R), 



(1) 



where tr{-) means the trace of a square matrix, R-^ de- 
notes the transpose of the matrix R. 

Discarding the constant term tr (D'^'D) in ir(R^R), 
we have 

ir(R^R) tr(H^(D + ^N)^(D -I- ^N)H) 

-2tr(H^(D + ^N)^D). (2) 

Taking to as a constant ^l4\, for the minimization prob- 
lem, by requiring 9tr(R^R)/9H = 0, we would have 
(D + /iN)'^(D + ^N)H-(D + ^N)^D = according to 
the differential rules in, for example, 0[ p. 472]. There- 
fore the optimal projector 



{(D + /^N)^(D + /^N)} '{B + iiNf-D. (3) 



With the noise components, dtr {R'^R)/dIi'^ ^ 2(D -|- 
/iN)"^(D -t- yuN) is positive definite, which confirms that 
the extremum taken at Hmin is a minimum. The corre- 
sponding minimal value 

tr™i„(R^R) = tr(D^D) - tr(D^(D + AiN)H„,i„). (4) 

But note that (D + /iN)-^ is not a square matrix, its 
(ordinary) inverse matrix usually is not defined, thus we 
could not cancel the terms of (D -I- fiN)"^ in Eq. Q. 

Since S = D + N, we could also 
write Eq. in the form of 

H,„i„ = {(S + (/i-l)N)^(S-l-(M-l)N)} 
x(S + (^-l)N)^(S-N). 



(5) 



If we assume the clean signal and the noise components 
are independent, statistically we have D'^N = N'^D = 
as M 00, hence S^S = D^D -I- N^N, and Eq. ® 
reduces to 



Hniin — "I S S 



(a* -1)N'N^ (S^S-N^N). (6) 



Let Cs, Cat denote the covariance matrices oi {si}f^i 
and {ni}fLi respectively, by assuming the expectation 
values E{s) = E{n) = 0, we have Cg = S^S/{M-m + l) 
and Cn = N^N/(A/ - m -h 1) as M ^ 00. Thus Eq. ® 
would be expressed as 



= {C5 + (/i2-l)C^} '(Cs-Cn), (7) 



which is consistent with the result in, for example, Eq. 
(3) of (7|. But note that here we use /z^ to substitute for 
the multiplier fj, in Eq. (3) of 0. Also note that H^y„ 
in our work is the transpose of that in Eq. (3) of 
this is because the trajectory matrices in our work are 
essentially the transpose of those in H, 0, ■ 

In many situations, although the noise components 
are theoretically uncorrelated to the clean signal, nu- 
merical calculations often indicate that the assumption 
D^N = N^D = does not hold strictly for finite data 
sets. As a more rigorous form, Eq. I^l needs no inde- 
pendence assumption between the noise components and 
the clean signal. Thus this expression is a further gener- 
alization of previous studies. 



III. NUMERICAL RESULTS 

We note that the trajectory matrices previously intro- 
duced are all Hankel matrices. Take trajectory matrix 
S of the noisy signal as an example, its entries satisfy 
S(*,i) = S(fc,Z) if i -I- j = k + I, where S(i,j) denote 
the element of matrix S on z-th row and j-th column. 
However, matrix Z — SH usually will not be a Hankel 
matrix, and we may have many ways to obtain the fil- 
tered (or projected) signal {zi}f£i. In our work we use 
the method of secondary diagonal averaging to extract 
signal from the matrix Z, which takes the average of the 
elements along the secondary diagonals of matrix Z as 
the filtered signal {zi}f£i (for details, see 0, p. 24]), 
and thus can form a new trajectory (Hankel) matrix Z^ 
horn {zi}f£^. Golyandina aZ. prove that this method is 
optimal among all Hankelization procedures in the sense 
that the matrix difference Z^ — Z has minimal Frobenius 
norm |^ p. 24, p. 266]. 

We adopt the signal-to-noise ratio (SNR) as the met- 
ric to evaluate the performance of our noise reduction 
scheme, which is defined (in dB) as 0,|8| 



SNR = 10 log 



10 



\\d\r 

\\z-df 



(8) 
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TABLE I: Performance of TDC projectors for the Rossler system (in unit 
of dB). 

TDC /i Additive white noise Additive colored noise Multiplicative noise 

20 25.50 ± 0.09 20 20.92 ± 0.05 20 21.11 ±0.10 

0.0 10 15.95 ±0.11 10 10.89 ± 0.04 10 ^ 11.14 ± 0.09 

0-^5.88 ±0.06 -> 0.87 ±0.04 ^1.16 ±0.10 

20 ^ 25.80 ±0.10 20 21.07 ± 0.05 20 ^ 23.23 ± 0.24 

0.5 10 ^ 17.74 ±0.15 10 ^ 11.64 ±0.06 10 -> 14.42 ± 0.23 

0-^9.71 ±0.10 0^3.12 ±0.08 -» 6.97 ± 0.22 

20 ^ 26.27 ±0.11 20 21.17 ±0.05 20 ^ 24.44 ± 0.34 

1.0 10 18.29 ±0.16 10 11.89 ±0.07 10 ^ 16.12 ± 0.32 

0^10.10 ±0.09 ^4.15 ±0.09 -> 9.56 ± 0.33 



where \\df = E and \\z - df = E (z^ - d,)^. 
1=1 1=1 
We first apply our algorithm to a simulated data set, 
which is generated from the x component of the Rossler 
system 

X = -{y ± z) 

y ==:x + ay (9) 
z — b + (x — c)z 

with parameter a = 0.15, b — 0.2 and c = 10. The data 
is evenly sampled for every 0.1 time units. We generate 
10, 000 data points and discard the first 1000 to avoid 
transition. To construct the trajectory matrices, we will 
set the window size m = 20 . 

Let {si}f^i and {di}f^^ again denote the noisy and 
clean signals respectively. We consider adding three 
types of noise contamination to the clean data. The first 
one is additive white noise {£,i}f^i (so that Si = di ±^j), 
which follows the normal Gaussian distribution A^(0, 1). 
The second one is additive colored noise (so that 

Si ~ di ± ryj, which, as an example, is produced from a 
third order autoregressive (Ai?(3)) process in the form of 
r^j = 0.877j_i — 0.577j_2 + 0.&r]^_^ + ^j, where variable ^ 
follows the normal distribution A^(0, 1). The last one is 
multiplicative noise {CjC^ili^i (so that = (1 ± C,i)di). 
As an example, we let = t^I, where {rjj}^^^ is from 
the previous AR{i) process, then the noise component 

{Ci'^i}f=i is correlated to the clean data {rfilfl^. 

By varying the magnitude of the introduced noise, we 
have the initial noise level be 20 dB, 10 dB, dB respec- 
tively, and for each noise level, we will include 10 different 
noise samples from the same process in calculation. We 
will also study the performance of the projectors under 
different constraints. As examples, we let TDC /i = 0, 
0.5 and 1 separately. TDC /i = will lead to the least- 
squares (LS) projector based on the SVD technique that 
appeared in, for example, We would need to 

specify the dimension of the signal-plus-noise subspace 
so as to group the EOFs and eigenvalues that correspond 
to the noisy signal and remove the complementary noise 



subspace, which is essentially related to the problem of 
choosing the embedding dimension for embedding recon- 
struction from a scalar time series (see the discussion in 
13] )■ Thus here we adopt the criterion of false nearest 
neighbor a method proposed for selection of appro- 
priate embedding dimensions. To apply this criterion 
in calculation, we utilized the codes implemented in the 
TISEAN package and found that the proper dimen- 
sion size K of the signal-plus-noise subspace is 5 in our 
cases. For /i = 1, we will obtain the well- know linear 
minimum mean-squared-error (LMMSE)^ projector (de- 
tailed introductions available in, e.g., [iJl)- After all of 
the calculations, we finally list the performance of these 
TDC projectors in Table |l| For better comprehension of 
the presented results, we provide the waveforms of all of 
the data listed in Table ^ as the supplementary material 
|l5l |. To keep our presentation concise, here we only take 
out the raw data contaminated with dB additive white 
noise as an example and depict its waveform of in panel 
(a) of Fig. 1^. For comparison, we also plot the aug- 
mented data with TDC= 0,0.5 and 1 in panel (6), (c) 
and (d), whose mean noise levels are 5.88, 9.71 and 10.10 
dB correspondingly. 

From Table n we see that for the Rossler system, our 
algorithm works for all of the three types of contami- 
nation. But the data augmentation for additive colored 
noise is not as obvious as those for additive white noise 
and multiplicative noise (the possible explanation is ex- 
plored in the appendix). We also see that, in general, the 
LMMSE projector has better performance than that of 
the LS projector in the sense that it can achieve better 
SNR as defined in Eq. ©. 

We then apply our algorithm to a very noisy speech 
(vowel) data (with 8,000 data points), which is sampled 
at 44 kHz and quantized to 16 bits. In this case we only 
know the background noise measured in the period with- 
out the signal. It would be preferred if we could produce 
a set of samples that mimic the behavior of the under- 
lying noise. Here we adopt the pseudo-periodic surro- 
gate (PPS) algorithm ^ to generate 9 surrogates based 
on the original background noise. With these data sets, 
the initial SNR of the speech data is estimated to be 
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Simulated data with dB white noise 

40 I • ' • 




data index 

Augmented data with TDC |j,=0.5 

40 1 • ' • ■ 




_40l • ' • • 1 

2000 4000 6000 8000 
data index 

FIG. 1: (a) Time series from the Rossler system contaminated 
time series by TDC projectors with /i = 0, 0.5 and 1 separately. 

—0.32 ± 0.18 dB via Eq.®. To introduce phase space 
projection to the speech data, we let the window size 
TO = 30 and set the dimension size of signal-plus-noise 
subspace to be = 8, and then apply the TDC projec- 
tors H to its trajectory matrix. For the LS projector 
{fi = 0), the augmented SNR= 4.36 ±0.41 dB. While for 
TDC fi — 0.5 and 1, the corresponding SNRs increase to 
6.28 ± 0.61 dB and 6.97 ± 0.66 dB respectively. As an 
illustration, we plot the waveforms of the original speech 
record and three projected data under different TDCs in 
Fig. from which we can see that, the LMMSE projec- 
tor (/i = 1) would lead to a smoother speech waveform 
(panel (d)) than that of the LS projector (panel (b)). 
Although the speech data output from the LMMSE pro- 
jector has lower (signal) magnitudes than those of the 
speech record from the LS projector, it is still preferred 
to its rival in speech communication since a smoother 
data will usually bring better communication quality. 



IV. CONCLUSION 

In this communication we re-examined the noise reduc- 
tion technique based on phase space projection. By im- 
posing a constraint on the residual noise, we deduced the 
optimal time domain constrained projector in the sense 
of minimizing signal distortion subject to a permissible 
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with dB additive white noise; (b), (c) and (d) Augmented 



noise level. We also showed that, in general we need 
not assume independence between clean signal and noise 
components as was previously done. This viewpoint was 
confirmed by our numerical results (see the third column 
of the calculation results in Table P). 

Appendix 

Here let us examine the metric of signal-to-noise ratio 
(SNR) in more detail. According to the definition in 
Eq. ©, SNR = lOlogio / \\z ~ df ,where \\df ^ 

M M 

2 p _ ^||2 ^ ^{zi- Note that = 

4=1 1=1 

tr(D^D)/TO and \\z - df = tr{{Z - Bf{Z - D))/m as 
M — > cx), thus 

SNR = 10 logio tr(D^D)-10 logn, tr((Z - D)^(Z - D)) . 

Since Z = SH, we have 

tr((Z-D)^(Z-D)) =ir(H^S^SH) - 2tr(D'^SH) + 
tr (D'^'D). For the case that the noise and the clean 
signal are independent, substituting the optimal pro- 
jector Hniin into the expression, it can be shown that 
tr„,in((Z-Df (Z-D)) =ir(D^D) - ir(H^inD^D). 
For simplicity, we assume the expectation values 
E{d) = E{n) = 0, then Co = D^D/(A/ - m -f 1) 
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FIG. 2: (a) Original speech record; (b), (c) and (d) Speech dat; 

and Cjv = N'^N/(M - m + 1) as M oo, where 
Cjj and Cjv are the covariance matrix of the clean 
signal and the noise respectively, and Hi„in can be 
expressed in the form of Eq. 0, or eqiiivalently, 
Hmin = {Cd + /i^Cjv}^^C£). Therefore in this case, we 
have tr„,in((Z - D)^(Z - D)) ^triCo) - triU^i^Co), 
thus the maximal SNR can be expressed by 

SNR 

max 

-tri{CD+^J:'CN}-'Cl)). (10) 

Through the SVD technique ) Cd can be written as 
Cd = V^jAijVl^ , where V^i is the normalized eigen- 
vector matrix of Cu , and Ajj is a diagonal matrix whose 
non-zero elements are the eigenvalues of Cd (in fact 
V^V^i = I™ and CdVd = Vd-Ad). Similarly, we 
have Cjv = VatAatV^. Let Yjy — VdPdn (for better 
comprehension, Pdn can be thought as a kind of projec- 
tion from Vat on Yd), then Cn = ^dPonAnP^^j^V]^. 
Substitute it into Eq. H1U|I . we have 

SNR 

max — lOlogio triAo) - lO\ogio{tr{AD) 

-tr{{AD + H^PDNANPlNr^Al)). 



(b) Augmented data with TDC |i=0 
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output from TDC projectors with n — 0, 0.5 and 1 separately. 

If the noise components are white, we have Apf — a'^lm 
(with a being the standard deviation of the noise pro- 
cess) and Vtv = V^j (i.e. Pdn = 'im) M- However, for 
the case of colored noise, usually Pdn I-m- Instead 
it is possible that the absolute value of the elements in 
Pdn are relatively small. Thus even for the same clean 
signal {di}f^ , the SNR^ax performance of the colored 
noise might be much worse than that of the white noise. 
This fact might explain the observation that the results 
in Table. I are not that promising for the additive colored 
noise. 
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